Trees are important to our environment as they provide home and food to many different organisms; they also take up carbon dioxide and release oxygen into our ecosystem. Many different factors can determine trees' growth, but sunlight, water and nutrients are essential for their growth.
This project analyzes trees planted on streets of Metro Vancouver and examines their growth based on different category variables provided.
import pandas as pd
import altair as alt
trees = pd.read_csv('small_unique_vancouver.csv',
usecols = ['neighbourhood_name', 'date_planted', 'diameter', 'genus_name', 'height_range_id', 'root_barrier'],
parse_dates = ['date_planted'])
trees.head()
| neighbourhood_name | date_planted | diameter | genus_name | height_range_id | root_barrier | |
|---|---|---|---|---|---|---|
| 0 | Riley Park | 2000-02-23 | 28.5 | ACER | 4 | N |
| 1 | Arbutus-Ridge | 1992-02-04 | 6.0 | PYRUS | 2 | N |
| 2 | Sunset | NaT | 12.0 | PINUS | 4 | N |
| 3 | Killarney | 1999-11-12 | 11.0 | FRAXINUS | 4 | N |
| 4 | Shaughnessy | NaT | 15.5 | AESCULUS | 4 | N |
Table 1: Tree dataset
Table 1 shows the first 5 rows present in the trees dataset.
The description of the data are obtained from the source.
The data contains public trees planted across the City of Vacouver. It includes the characteristics, the location, and the classification of the trees. This data, however, has been processed and reduced prior to the analysis. The 6 columns and their description that are going to be used in this report are as follow.
| Column | Description |
|---|---|
| neighbourhood_name | The neighbourhood in which the tree was planted |
| date_planted | The date the plant was planted in YYYYMMDD format |
| diameter | The diameter of the tree in inches at breast height |
| genus_name | The genus name of the trees |
| height_range_id | The height of tree in feet (0-10 ft = 1, 10-20 ft = 2, etc...) |
| root_barrier | Whether a root barrier was installed or not |
The statisical analysis of the data are presented below.
trees.info()
print('\n')
trees.describe(include = 'all', datetime_is_numeric = True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 neighbourhood_name 5000 non-null object 1 date_planted 2363 non-null datetime64[ns] 2 diameter 5000 non-null float64 3 genus_name 5000 non-null object 4 height_range_id 5000 non-null int64 5 root_barrier 5000 non-null object dtypes: datetime64[ns](1), float64(1), int64(1), object(3) memory usage: 234.5+ KB
| neighbourhood_name | date_planted | diameter | genus_name | height_range_id | root_barrier | |
|---|---|---|---|---|---|---|
| count | 5000 | 2363 | 5000.000000 | 5000 | 5000.00000 | 5000 |
| unique | 22 | NaN | NaN | 67 | NaN | 2 |
| top | Renfrew-Collingwood | NaN | NaN | ACER | NaN | N |
| freq | 384 | NaN | NaN | 1218 | NaN | 4679 |
| mean | NaN | 2003-09-06 04:03:08.912399488 | 12.340888 | NaN | 2.73440 | NaN |
| min | NaN | 1989-10-31 00:00:00 | 0.000000 | NaN | 0.00000 | NaN |
| 25% | NaN | 1997-11-06 00:00:00 | 4.000000 | NaN | 2.00000 | NaN |
| 50% | NaN | 2003-02-12 00:00:00 | 10.000000 | NaN | 2.00000 | NaN |
| 75% | NaN | 2009-11-17 00:00:00 | 18.000000 | NaN | 4.00000 | NaN |
| max | NaN | 2019-05-07 00:00:00 | 71.000000 | NaN | 9.00000 | NaN |
| std | NaN | NaN | 9.266600 | NaN | 1.56957 | NaN |
Table 2: Statistical review of the tree dataset
The revised dataset has a total of 5000 entries. All columns except date_planted have complete values. More than 50% of the data are missing from date_planted. There are 3 categorical columns, 2 numerical columns, and 1 datetime column.
According to Table 2, the categorical columns have 22, 67, and 2 unique values. The neighbourhood with most trees planted is Renfrew-Collingwood, the most common genus is Acer, and most trees (93%) do not have root barrier installed.
trees['year_planted'] = trees['date_planted'].dt.year.astype('Int64')
trees = trees.drop(columns = ['date_planted'])
Since the analysis consists of only looking at the year, the date_planted column was changed to int64 dtype. The missing values were converted to 0, and were not included in the analysis.
Tree barrier is absent for 93% of the trees in the dataset. It would be interesting to see if the growth of the trees affected based on the presence or the absence of tree barriers.
barrier_plot = alt.Chart(trees).mark_bar().encode(
alt.X('mean(height_range_id)', title = "Average height range of trees (ft)", axis = alt.Axis(grid = False)),
alt.Y('root_barrier', title = "Root barrier", scale = alt.Scale(domain = list(trees["root_barrier"].unique()))),
alt.Color('root_barrier', legend = None, scale = alt.Scale(scheme = 'set2')),
tooltip = [alt.Tooltip("count():Q", title = "Number of trees"),
alt.Tooltip("mean(height_range_id):Q", title = "Average height range of trees (ft)", format='.2f')]
).properties(width = 300, title = "The presence of root barriers limits trees' growth")
barrier_plot
Figure 2: Comparison between average tree height range based on tree barrier
As shown in Figure 2, trees without root barriers have higher average height range compared to trees with root barriers. They grow twice as much as those with root barriers. It appears that root barriers limit the roots from recieving essential nutrients for trees' growth.
If certain neighbourhoods have more trees compared to others, it could an indication that that neighbourhood have more resources required for trees' growth. To answer the question, it would be best to compare trees that are planted in the same year.
trees_by_year = (
alt.Chart(trees)
.mark_bar(color = 'coral')
.encode(
alt.X('year_planted', title = "Year", scale = alt.Scale(nice = False),
axis = alt.Axis(labels = False, grid = False, tickSize = 0)),
alt.Y('count()', title = "Number of trees"),
tooltip = [alt.Tooltip("year_planted:Q", title = "Year"),
alt.Tooltip("count()", title = "Number of trees")])
.properties(width = 250)
)
trees_by_year.properties(title = "Number of trees planted between the years 1989 and 2019")
Figure 3: Number of trees planted for 1989-2019
Figure 3 shows the total number of trees planted each year between 1989 and 2019. The rows with missing date values are emitted in this plot. More trees were planted in the mid 1990's while less trees where planted in the mid 2010's.
This will be used as a selection tool to compare the average tree height range of different neighbourhoods.
select_year = alt.selection_multi(encodings = ["x"], on = 'click', nearest = True)
year_click = (
trees_by_year.encode(
opacity = alt.condition(select_year, alt.value(0.8), alt.value(0.2)))
.add_selection(select_year)
.properties(width = 400, height = 100, title = {"text": "Number of trees planted between 1989 and 2019",
"subtitle": "Click to select year, hold shift to select multiple years, double-click to clear"})
)
nbh_trees = (
alt.Chart(trees)
.transform_filter(select_year)
.mark_bar()
.encode(
alt.X('neighbourhood_name', title = "Neighbourhood", axis=alt.Axis(labelAngle=-45), sort = '-y'),
alt.Y('trees_mean:Q', title = "Average height range (ft)"),
alt.Color(value = 'coral'),
tooltip = [alt.Tooltip('trees_mean:Q', title = "Average height range (ft)", format='.2f')])
.transform_aggregate(
trees_mean = 'mean(height_range_id)',
groupby = ['neighbourhood_name'])
.transform_window(
rank = 'rank(trees_mean)',
sort = [alt.SortField('trees_mean', order = 'descending')])
.transform_filter(alt.datum.rank <= 8)
.add_selection(select_year)
.properties(width = 400, height = 300, title = "Average height range per neighbourhood")
)
year_click & nbh_trees
Figure 4: Average height range of trees per neighbourhood based on year
The average height range of trees based on their year planted for each neighbourhood is presented in Figure 4. For each year, the neighbourhood with the highest height range changes; this could be due to fact that the trees were planted in different neighbourhoods for different year. Fairview, however, seems to appear frequently at the top of the average height range for the years that it is present in. This could indicate that Fairview has more resources available for trees' growth. Not
Tree growth is also dependent on how the trees utilize available resources. Each type of trees utilize the resources differently; therefore, it would be interesting to compare how the trees grow based on the genus they belong to. Similar to the neighbourhood comparison method, comparing the genera based on year is ideal. Trees grow in two ways, primarily in their height and length, and secondarily in their thickness. As the height range is used to compare the neighbourhoods, diameter will be used to compare the genera.
genus_trees = (
alt.Chart(trees)
.transform_filter(select_year)
.mark_bar()
.encode(
alt.X('genus_name', title = "Genus", axis=alt.Axis(labelAngle=-45), sort = '-y'),
alt.Y('trees_mean:Q', title = "Average diameter (in)"),
alt.Color(value = 'coral'),
tooltip = [alt.Tooltip('trees_mean:Q', title = "Average diameter (in)", format='.2f')])
.transform_aggregate(
trees_mean = 'mean(diameter)',
groupby = ['genus_name'])
.transform_window(
rank = 'rank(trees_mean)',
sort = [alt.SortField('trees_mean', order = 'descending')])
.transform_filter(alt.datum.rank <= 8)
.add_selection(select_year)
.properties(width = 200, height = 300, title = "Average diameter per neighbourhood")
)
year_click.properties(width = 480) & (nbh_trees.properties(width = 200)| genus_trees)
Figure 5: Average growth of trees based on year
Based on Figure 5, there seems to be some genera with higher average diameter, but like the comparison between the neighbourhoods, the highest average diameter changes for each year. The genera that frequently appear at the top are Acer, Prunus, Fagus, and Quercus. Figure 5 can also be used to compare the average diameter of trees in each neighbourhood in addition to the planted year.
Trees are essential part of our ecosystem as they take up carbon dioxide and produce oxygen. They also provide home and food to many different organisms. Their growth is determined by many different factors, but the most essential factors are sunlight, water, and nutrients. In this project, the characteristics of the trees as well as possible factors affecting the trees' growth were analyzed.
In Figure 1, we saw that 93% of the trees in the dataset do not have root barrier installed. We learned that root barrier limits trees' growth as trees without root barrier have twice the average height range compared to trees with root barrier.
Figure 2 shows the total number of trees planted for each year between 1989 and 2019. It is not representative of the whole dataset as the data is missing over 50% of the entries in the date_planted column. The trees planted each year appear to be quite consistant as approximately 80 to 130 trees are planted each year for most years. From this observation, it seems that most of the missing values belong to trees planted in the years 1989-1991 and 2015-2019 as these years have much less data compared to other years. When looking at the date_planted column, the data starts from October 1989, and ends at May 2019. Just from 1989 and 2019, a total of 17 months worth of data are missing.
Figure 3 and 4 explore the averages of the height range and the diameter of the trees and compare them based on their location/neighbourhood and the genus of the trees. Using the year plot from Figure 2, we can select and filter the data for the year(s) and compare the averages of the neighbourhood and the genus. In addition, Figure 4 allows us to look at the biggest trees present in each neighbourhood by simply selecting a neighbourhood. It also allows as to see compare the sizes of each genus for different neighbourhood by selecting a genus.
To conclude, the growth of the trees does appear to be effected by different factors. Some neighbourhoods have access to better sunlight, water, and nutrients compared to others; some genera are able to utilize the resources more efficiently compared to others. For future studies, it would be of interest to test if there are other factors/columns from the dataset that affect the trees' growth. This analysis could have improved if we could dual select root_barrier and year_planted and filter out the data to compare the averages of the trees' growth.
barrier_plot = (
alt.Chart(trees).mark_bar().encode(
alt.X('mean(height_range_id)', title = "Average height range of trees (ft)", axis = alt.Axis(grid = False)),
alt.Y('root_barrier', title = "Root barrier", scale = alt.Scale(domain = list(trees["root_barrier"].unique()))),
alt.Color('root_barrier', legend = None, scale = alt.Scale(scheme = 'set2')),
tooltip = [alt.Tooltip("count():Q", title = "Number of trees"),
alt.Tooltip("mean(height_range_id):Q", title = "Average height range of trees (ft)", format='.2f')])
.properties(width = 250, height = 80, title = "The presence of root barriers limits trees' growth")
.transform_filter(select_year)
.add_selection(select_year)
)
select_year = alt.selection_multi(encodings = ["x"], on = 'click', nearest = True)
year_click = (
trees_by_year.encode(
opacity = alt.condition(select_year, alt.value(0.8), alt.value(0.2)))
.add_selection(select_year)
.properties(width = 250, height = 80, title = "Number of trees planted between 1989 and 2019")
)
nbh_trees = (
alt.Chart(trees)
.transform_filter(select_year)
.mark_bar()
.encode(
alt.X('neighbourhood_name', title = "Neighbourhood", axis=alt.Axis(labelAngle=-45), sort = '-y'),
alt.Y('trees_mean:Q', title = "Average height range (ft)"),
alt.Color(value = 'coral'),
tooltip = [alt.Tooltip('trees_mean:Q', title = "Average height range (ft)", format='.2f')])
.transform_aggregate(
trees_mean = 'mean(height_range_id)',
groupby = ['neighbourhood_name'])
.transform_window(
rank = 'rank(trees_mean)',
sort = [alt.SortField('trees_mean', order = 'descending')])
.transform_filter(alt.datum.rank <= 8)
.add_selection(select_year)
.properties(width = 250, height = 300, title = "Average height range per neighbourhood")
)
genus_trees = (
alt.Chart(trees)
.transform_filter(select_year)
.mark_bar()
.encode(
alt.X('genus_name', title = "Genus", axis=alt.Axis(labelAngle=-45), sort = '-y'),
alt.Y('trees_mean:Q', title = "Average diameter (in)"),
alt.Color(value = 'coral'),
tooltip = [alt.Tooltip('trees_mean:Q', title = "Average diameter (in)", format='.2f')])
.transform_aggregate(
trees_mean = 'mean(diameter)',
groupby = ['genus_name'])
.transform_window(
rank = 'rank(trees_mean)',
sort = [alt.SortField('trees_mean', order = 'descending')])
.transform_filter(alt.datum.rank <= 8)
.add_selection(select_year)
.properties(width = 250, height = 300, title = "Average diameter per neighbourhood")
)
((barrier_plot | year_click) & (nbh_trees | genus_trees))